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Latent drop-out transitions in quantile regression 
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Abstract 


Longitudinal data are characterized by the dependence between observations coming 
from the same individual. In a regression perspective, such a dependence can be usefully 
ascribed to unobserved features (covariates) specihc to each individual. On these grounds, 
random parameter models with time-constant or time-varying structure are well established 
in the generalized linear model context. In the quantile regression framework, specifications 
based on random parameters have only recently known a flowering interest. We start from 


the recent proposal by Farcomeni (2012) on longitudinal quantile hidden Markov models, 


and extend it to handle potentially informative missing data mechanism. In particular, we 
focus on monotone missingness which may lead to selection bias and, therefore, to unreliable 
inferences on model parameters. We detail the proposed approach by re-analyzing a well 
known dataset on the dynamics of CD4 cell counts in HIV seroconverters and by means of 
a simulation study. 


1 Introduction 


Quantile regression has become a standard tool to model the distribution of a continuous re¬ 
sponse variable as a function of a set of observed covariates. When the interest lies not only 
on the center of the response distribution and/or when the observed data may include some 
outliers, quantile regression represents an interesting alternative to standard mean regression. 


During the last few years, the basic homogeneous quantile regression model (Koenker and Bas¬ 


sett 


1978) has been extended to deal with longitudinal responses. To handle the dependence 


between measurements taken over time on the same individual, unit-specific, time-constant, 


random parameters can be added to the model specification (see eg Geraci and Bottai, 2007 


2014). A potential alternative is to consider time-varying random parameters. In this per¬ 


spective, by extending standard hidden Markov models (Wiggins, 1973), Farcomeni (2012) 
proposes a linear quantile hidden Markov model with a random intercept that varies over time 
according to a first-order hidden Markov chain. For a general treatment of hidden Markov 


models (HMMs) for longitudinal data, see Bartolucci et al (2013) and references therein. 

A common feature of longitudinal studies is that individuals may leave the study before its end. 
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Thus, variable-length individual sequences represent a further challenge, since not all individ¬ 
uals have the same weight in building up the log-likelihood function. A major problem is the 
so-called informative missingness: once conditioning on the observed covariates and responses, 
the selection of units in the study may still depend on future, unmeasured, responses. When 
ignored, this missing data generating mechanism may severely bias parameter estimates and 


lead to misleading conclusions. Following the proposals by Roy (2003) and Roy and Daniels 


(2008), we consider a pattern mixture representation (Little and Wang, 1993) and develop a 


linear quantile hidden Markov model with latent drop-out classes. The idea behind such model 
is that, after conditioning on the observed covariates, differences between sample units arise 
due to unobserved heterogeneity. Random parameters varying over time according to a hidden 
Markov chain capture differences related to the dynamics of omitted covariates. A further 
source of unobserved heterogeneity may be represented by sub-samples of individuals being 
characterized by a different propensity to drop-out from the study. These sub-populations 
are identified by adding in the model a latent multinomial variable, whose ordered categories 
directly influence the Markov transition matrix. 

The paper is structured as follows: in section 2, the linear quantile hidden Markov model is 
briefly reviewed. In section 3, we extend this proposal in a pattern mixture perspective, by con¬ 
sidering latent drop-out classes to capture individual-specific propensities to leave the study. 
The modified EM algorithm for parameter estimation is discussed in section 4; the proposed 
method is applied in section 5 to a well-known benchmark multi-center longitudinal study on 
the time progression of CD4 cell numbers in HIV seroconverters. Section 6 discusses the results 
of a simulation study. Last section contains concluding remarks and outlines potential, future, 
research lines. 


2 Linear quantile hidden Markov models 

Let us suppose a longitudinal study collects repeated measures of a continuous response vari¬ 
able Yit on a sample of i = 1,...,n subjects at time occasions t = 1,... ,T. To account for 
dependence between measurements on the same statistical unit, a standard approach is to spec¬ 
ify a conditional model for the responses, which are assumed to be independent, conditional 
on a set of individual-specific latent variables. In the context of generalized linear models for 
longitudinal responses, such latent effects may be either time-constant, as in mixed models 
( fLaird and Ware 1982 ), or time-varying, as in hidden Markov models ([Wigging |1973|). For 


a combination of both, see Altman (2007) and Maruotti (2011). While this class of models 


has quite a long history in the generalized linear model framework, only recently its scope has 


been broadened to quantile regression, see Geraci and Bottai (2007) and Geraci and Bottai 


(2014). Models with time-varying parameters have been introduced by Farcomeni (2012) to 


model the (conditional) quantiles of a longitudinal response. This proposal (in the following 
IqHMM) is based on the existence of two related processes: a latent process with a Markov 
structure and an observed measurement process, whose parameters are defined by the current 
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state of the hidden Markov chain. Conditional on the state occupied at a given time occasion, 
the longitudinal observations from the same individual are assumed to be independent (local 
independence assumption). 

Let us consider a quantile r G (0,1), and denote by 5jt(r) a quantile-specific, homogeneous, 
first order, hidden Markov chain. The chain takes values in the finite set S{t) = {1,..., m(r)}; 
d(r) and Q(t) are the initial probability vector and the transition probability matrix of the 
chain, respectively. The IqHMM can be specified as follows: 


Yu I Sit ~ ALD(/iit(sit,r),cr(r),r) 

(sit, r) = a{sit, t) + x'j/3(r) 


( 1 ) 


where /r, a and r are the location, the dispersion and the skewness parameters for the asym¬ 
metric Laplace distribution. The location parameter is linear in the time-varying intercept, 
a{sit-, t), and in the vector of fixed effects (3{t). The assumption that the response variable has 
an asymmetric Laplace distribution, see [Geraci and Bottai (2007), is made to recast standard 
quantile loss optimization within a maximum likelihood perspective. Moving from the random 
intercept to the more general random coefficient framework, we may write 

{sit, t) = x'j/3(r) 'z'uOl{su, t) 

where /3(r) summarizes the fixed effect of observed covariates on the r-th (conditional) quantile 
of the response distribution, while a{su,T) represents the individual-specific effect associated 
to a subset of xu for an individual in state su at time occasion t. Based on the modelling 
assumptions, the individual contribution to the observed data likelihood can be written as 
follows: 

friYi) = frisiYi I Si)fs{si) (2) 

Si 

Obviously, this framework leads to quite a general structure of association between longitudinal 
measurements. However, this model can not properly handle incomplete sequences due to an 


informative missing data process (Little and Rubin, 2002). In the next section, we extend such 


a model specification to account for individual differences in the propensity to leave the study. 


3 Handling informative missingness 

Let us consider a measurement process affected by monotone missingness: for each unit 
i = 1,... ,n, the measurements are available at time points t = 1,...,Tj only, with Tj < T. Let 
us denote by Ru the missing data indicator variable, which is equal to 1 if the f-th subject is not 
available at the t-th occasion. Since we consider monotone missingness, Ru = 1 Ru' = 1, 
t' >t = When the drop-out is informative, the missing data process needs to be 

properly modelled, at the risk of obtaining unreliable parameter estimates. The drop-out is 
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defined to be informative when, conditional on observed responses and covariates, the missing 
data process still depends on the current, unobserved, values and/or when parameter distinc¬ 
tiveness between the distribution of V and R does not hold, see [Little and Rubin (2002) for a 
general treatment. 

In these cases, a more general model should be defined. Few attempts to handle informative 


missingness have been made in the quantile regression framework; Lipsitz et al (1997) and Yi 

and He (2009) suggest a GEE approach (Liang and Zeger 

1986|, while Farcomeni and Viviani 

(2015) consider a joint model (JM) representation, see 1 

lizopoulos ([2012). While this latter 


approach is an elegant way to handle dependence between longitudinal responses and missing¬ 
ness, JMs require the distribution for the missing data process to be completely specified and 
this often represents a delicate matter. Here, we focus on pattern mixture models (PMMs), see 


Little and Wang (1993). The rationale for pattern-mixture models is that each subject has its 


own propensity to drop-out from the study. Individuals with similar propensities share some 
common observed/unobserved features and the model for the longitudinal response is given 
by a mixture over these patterns. Pattern mixture models do not need the distribution of the 
missing data generating process to be specified, but are often overparameterized. This issue 
may be (at least potentially) solved by defining appropriate identifying restrictions. Latent 
drop-out (LDO) models ( |Roy , [2003 Roy and Daniels, 2008) represent a viable solution. In 
such a specification, a limited number of LDO classes is considered to avoid overparametriza- 
tion; sample units belonging to the same LDO class share common unobserved characteristics 
that influence, either directly or indirectly, the response variable distribution. To explain our 
proposal, let be a (quantile-specific) multinomial random variable 

with component (igir) = I if subject i belongs to the ( 7 -th drop-out class and zero otherwise. 
These categories represent ordered propensities to drop-out; that is, we assume that, for g > g', 
the propensity to drop-out for units with Cigi^) = 1 is lower than the propensity of units with 
Cig'('r) = 1- For a generic quantile r G (0,1), this ordering is specified through the following 
model: 

exp{Aog(r) -h Ai(t) R} 


Pr j;0z(r) = l\Ti 


^l=l 


1 -h exp{Aog(T) -h Ai(r) R}' 


(3) 


under the constraint Aog(r) < Xog,{T) g < g'. The probability of belonging to one of the first 
g classes is, thus, modelled as a monotone function of the time to drop-out; the probability of 
a specific class is obtained as the difference between two adjacent cumulative logits ( Agresti| 



the cumulative probabilities and the above constraints imply that the distribution of Cii''') 
different values of Tj is stochastically ordered. We assume that the latent drop-out class vari¬ 
able summarizes all the dependence between the longitudinal response and the missing data 
mechanism; conditional on the drop-out class, the two processes are independent. As it is obvi¬ 
ous, LDO classes may influence the response variable distribution in several ways: for example. 


they may produce class-specific changes to the fixed effect parameter vector, as in Marino et al 
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(2015). Alternatively, they may produce changes in the locations of the hidden Markov chain, 


thus giving rise to a LDO-specific support for the time-varying random parameters. Here, we 
discuss a further alternative; we assume that LDO representation produces a change in the 
matrix of transition probabilities. That is, we consider a (quantile-specific) homogeneous, first 
order, hidden Markov chain, Suir), taking values in the finite set S{t) = {1,... ,m(r)}. The 
corresponding initial probability vector is assumed to be constant among LDO classes and 
is denoted by S{t), while the transition probability matrix Q{g;T) is specific to each LDO 
class, g = 1,... ,G. This approach shares some features with the proposal by [Maruotti and 


Rocci (2012); here, latent class-specific transitions are considered in the framework of standard 


HMMs. As it is clear, the proposed specification covers a range of situations which is more 
general than a simple change in the location parameters of the hidden Markov chain. By al¬ 
lowing Q(-) depend on g, we may define states that are “visited” only by individuals in a given 
LDO class, that is latent class-specific parameter values. The proposed model is in line with 


Bartolucci and Farcomeni (2015) and Maruotti (2015), where standard HMMs are extended 


to deal with informative drop-outs. More in detail, Bartolucci and Farcomeni (2015) discuss 


a shared parameter model with time-constant and time-varying (discrete) random intercepts 


shared by the longitudinal and the missing data process. Maruotti (2015) describes a pattern 


mixture approach with the Markov transition matrix being a function of the time to drop-out. 
When compared with the former, our proposal does not need the distribution of the missing 
data process to be specified, thus allowing to avoid unverifiable parametric assumptions. When 
compared with the latter, our approach seems to be more general and offer greater flexibility. 
Let ’Jf(r) = (0(t), cr(r), ^(r), Q(t), A(r)), where 9{t) = (/3(r), Q:i(r),..., 
a.jn(T){T)) denotes the vector of longitudinal model parameters, and let ^(r) be the vector 
of parameters indexing the distribution of the time to drop-out, friTi \ ^;t). Based on the 
previous modelling assumptions, the observed individual likelihood for a generic sample unit is 
obtained by marginalizing the joint distribution of the observed and the latent variables over 
the hidden Markov chain and the latent drop-out class indicator. Suppressing the dependence 
on model parameters to simplify the notation, the following expression holds: 

/yr(yi,7i;r) = frisciYi I \ Ci;^)/c|T(Ci I 'r)/r(7i;r). (4) 

From the above equation, it is clear that the marginal distribution of the time to drop-out can 
be left unspecified and ignored when maximizing the likelihood with respect to ’I'(t); inference 
may be based on the conditional distribution fyiriyi I only. 

4 Parameter estimation 


The general structure of the EM algorithm (Dempster et al, 1977) we use for parameter esti¬ 


mation can be sketched as follows. To keep the notation simple, we will omit the dependence 
of model parameters on the specific quantile r we consider. Let Uit{h) = I{Sit = h) be the 
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variable indicating the z-th unit is in the /i-th hidden state at occasion t and let Uit{h, k) be 
the indicator variable for the i-th unit moving from the h-th state at occasion t — 1 to the A:-th 
one at t. Last, let Qg be the indicator variable for unit i = 1,... ,n in the g-th latent class. 
For a given quantile r, the (conditional) log-likelihood for complete data is 




n I m 


Ti m m G 
i=l I h=l t=l h=l k=l g=l 


G Ti m 

^ ^ Cig ^Og 'Kg Ti log (T -EE Uit{h)pr 

g=l t=l h=l 


Hit PitiSit — h') 


a 


(5) 


The E-step of the algorithm requires the computation of the expected values for the indicator 
variables Uit{h),Uit{h, k) and Cig, conditional on the observed data and the current parameter 
estimates. As it is usual with hidden Markov models, such a computation is simplified by 


considering the forward and backward variables (Baum et al, 1970). In the present framework. 


for a generic individual in the 5 f-th latent drop-out class, forward variables, ait{h, g), define the 
joint density of the longitudinal measures up to time t and the h-th state at t: 


(^it{h,g) — / [yn-.ti Sit — h I ^ig — 1] 


( 6 ) 


Following Baum et al (1970), these terms can be computed recursively 


anih, g) = 6hfY\s[yii \ Sii = h] (7) 

m 

ait{h,g) = '^ait-i{k,g)qkh{g)fY\s[yit \ Su = h]. 

k=l 

Similarly, the backward variables, bit{h,g), represent the probability of the longitudinal se¬ 
quence from occasion t -|- 1 to the last observation, conditional on being in the g-th LDO class 
and in the h-th state at time t: 


bitih, g) = f [yu+i-.T, I Sit = h, Cig = l]. 

As for the forward, also backward variables can be derived recursively: 
bm{h,g) = 1 

m 

bit-i{h,g) = ^bit{k,g)qhk{g)fy\sh[yit \ Su = h], 
k=l 


( 8 ) 


(9) 


For a detailed description of the Baum-Welch algorithm, see the seminal paper by Baum et al 
(|1970) and the reference monograph by Zucchini and MacDonald (2009). 


Computation of the expected complete data log-likelihood, conditional on the observed data 
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and the current parameter estimates, leads to 


n ^ m Ti m G 

Q{^ I = EE Uii{h) log + EEE CigUit{k,h I g)\ogqkh{g)+ 

i=l ^ h=l t=2 h,k=l g=l 

+ '^CiglogTrg - Tilog{a) \^Uitih)pr (——\ (iq) 

g=l t=l h=l g=l ^ ' 


where uaih) and Qg represent the posterior expectation of the indicator variables we have 
previously introduced. Moreover, Uit{k,h \ g) denotes the posterior probability for the i-th 
unit in state k at occasion t — 1 and moving to state h at occasion t, given she/he belongs 
to the 5 -th LDO class. These posterior probabilities can be easily obtained by exploiting the 
forward and backward variables 0 and as: 


Uit{h) 


Ylg ait{h,g)bit{h,g)-Kg 
EhJ2g auih, g)bit{h, g)Trg 


Uit{k,h I g) 


ait-i{k, g)qkh{9) fs\s {Vit \ Su = h,)bit{h,g) 

/y|S (j/ii I Sit = h,)bit{h,g)' 


? _ g)ng 

J2gYlh(^mih,g)TTg 


The M-step of the EM algorithm require the maximization of the Q{-) function with respect 
to model parameters. Closed form solutions are available for the parameters of the hidden 
Markov process: 


<5. = 


T:=ina{h) 


EEi YlYuit{k,h\ g) 


n 


qkhig) = 


Er=iEr=iEr=i^*t(fc,Mff) 


( 11 ) 


The scale parameter of the longitudinal response distribution is estimated as 


a = 


^ Ti m 

v-^EE Py (gjit Piti.Sit — ^)) • 

^*=1 * t=l h=l 


( 12 ) 


Parameters in the longitudinal and in the LDO class model, (0, A), are estimated by finding 
the zeros of weighted score functions. For the longitudinal ontcome, weights are given by the 
posterior probabilities of the hidden states, uuih); the following estimating equation holds 


n Ti m 






Hit pities it) 


a 


= 0 , 


(13) 
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For the latent drop-out model, the score function is weighted by means of the LDO class 
posterior probabilities, Qg, leading to 


n G-1 


d 


E E ^^ 9 -^ < log 

2=1 g=l 


\ _|_ g-^og+AiTi 


gAog—i+AiTj 
1 _|_ gAog-i+AiTi 


= 0 


(14) 


The E- and the M- steps are repeatedly alternated until the difference between the likelihood 
values for two successive iterations is lower than a fixed constant e > 0, that is 


h+l) _ £{r) 


< e. 


The algorithm reaches convergence for a given number of hidden states, m, and latent drop¬ 
out classes, G, which we consider fixed and known. For a given combination [m,G], several 
starting points are used to avoid local maxima. As a result, we have a set of possible solutions, 
and the final [m, G]-based estimates come from the model with the highest log-likelihood value 
obtained over the set of starting points considered. As it typically happens in the linear quantile 
mixed model framework, standard errors for parameter estimates are derived by exploiting a 


non-parametric block bootstrap (see eg Buchinsky, 1995). Bootstrap samples are obtained by 
sampling individuals and retaining the corresponding longitudinal sequence, to preserve the 
within individual dependence structure. 


5 Real data example: CD4 data 

To explore the empirical behaviour of the model, we consider the CD4 cell count data discussed. 


among others, by Zeger and Diggle (1994). These data come from the Multicenter AIDS cohort 


study (MACS) conducted since 1984 with the aim at analysing HIV progression over time (see 
Kaslow et al[ 1987). It includes nearly 5000 gay and bisexual men from Baltimore, Pittsburgh, 
Chicago and Los Angeles. One of the effects of HIV is the reduction of T-lymphocytes, referred 
to as CD4 cells, which play a vital role in immune function; the virus progression can be assessed 
by measuring the number of CD4 cells over time. 

We have considered 2376 repeated measurements coming from 369 men who were seronegative 
at the beginning of the study and seroconverted in the meanwhile. They have been observed 
from 3 years before up to 6 years after the seroconversion: each individual has been followed 
from a minimum of 1 to a maximum of 12 occasions. While the time occasions are not 
equally spaced, the distribution of the time elapsed between successive visits is concentrated 
around 0.50 (that is half a year) and, therefore, we may consider occasions as if they were 
equally spaced. This greatly simplifies notation and estimation. At each visit, a number of 
covariates has been measured together with the level of T-lymphocytes in the blood: years 
since seroconversion (negative values indicate that the current CD4 measurement has been 
taken before the seroconversion), age at seroconversion (centered around 30), smoking (packs 
per day), recreational drug use (yes or no), number of sexual partners, depression symptoms 














as measured by the CESD scale (larger values indicate more severe symptoms). The analysis 
has been conducted on the log transformed CD4 counts, that is log(l+CD4 count). 

As it is often the case with longitudinal designs, some of the units in the sample leave the 
study before its ending, and present incomplete information. In table we report the number 
of individuals available at each visit; as it can be seen, only a small number of individuals 
presents complete data records. 


Table 1: Number of individuals in the study at each time occasion. 


Visit 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

369 

364 

340 

315 

268 

225 

173 

133 

92 

54 

33 

10 


Figure [^displays the mean response evolution during the follow up, for the overall sample and 
stratified by whether or not the units drop-out from the study between the current and the 
subsequent time occasion. As it may be noticed, a progressive decrease in the CD4 counts 
is observed, which is coherent with the progression of the virus. However, some differences 

Figure 1: Response variable distribution at each time occasion. 



-Overall-- Next Measure Available --Next Measure Unavailable 


between the units staying and those dropping-out from the study between t and t -|- 1 may be 
noticed. The latter (individuals) present CD4 levels which are lower when compared to units 
remaining in the study beyond t + 1, especially at the beginning of the observation window. 
These findings suggest the potential presence of some form of sample selection. To analyse the 
effect of observed covariates on the HIV progression and account for the missing data process, 
we have estimated a linear quantile hidden Markov model with LDO-dependent transitions. 
To give some insight into the sensitivity of parameter estimates to modelling assumptions, we 
compare these results with those obtained from the corresponding MAR version, the IqHMM 
(jFarcomeni 2012). Being more severe HIV-related symptoms the main target of inference, we 


have decided to focus on lower CD4 count levels, that is on r = (0.25,0.50). For a generic 
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quantile r G ( 0 , 1 ), the following conditional model for longitudinal observations has been fit: 

= a{sit) +x'j/3 

where a{sit) denotes a state-dependent random intercept, while xu includes two continuous 
covariates (years since seroconversion and age), the dummy variable drug (baseline: no) and 
three discrete variables (packs of cigarette per day, number of sexual partners and CESD 
score). Both IqHMM and lqHMM-|-QLDO have been fit for a varying number of hidden states 
(m = 2 ,..., 5) and, if the case, for a varying number of latent drop-out classes {G = 2 ,..., 5). To 
reduce the chance of being trapped in local maxima, we have adopted a multi-start strategy. 
For the hidden Markov chain, a first deterministic starting solution has been obtained by setting 
prior and transition probabilities to 5h = l/m and qkh = (l-|-sll(/i = k))/{m + s),h,k = 1, ...,m 
(for a suitable constant s) for all the LDO classes (when present). Parameters in the missing 
data model have been initialized by fitting an ordered logit to the response obtained by dis¬ 
cretizing the distribution of the number of visits for each individual. To avoid singularities, 
a fraction ^ of responses has been randomly perturbed. Initial values for the fixed longitudi¬ 
nal model parameters correspond to the maximum likelihood estimates of the linear quantile 
regression model for independent observations, while the time-varying random intercept has 
been initialized by adding Gaussian quadrature locations to the corresponding fixed intercept. 
Random starting values have been obtained by perturbing the deterministic ones. For each 
model (ie for each combination [m,G]), we have considered 30 starting points and retained 
the solution with the highest likelihood. In table we report the corresponding AIC and the 
BIG values for such solutions. As it was expected, because of the high number of parameters 
in the IqHMM-l-QLDO formulation, both criteria suggest to retain the solution with m = 5 
and G = 1 for the quantiles we have considered. However, by looking at the AIG values, we 
noticed only slight differences between the solution [m = 5,(7 = 1] and [m = 5,(7 = 2]. This 
suggests that, despite the highly parametrized structure of the lqHMM-|-QLDO formulation, 
model fit (as measured by the maximized log-likelihood value) is improved when accounting 
for the missing data generation process. Furthermore, simulation results in sectionj^show that 
the BIG leads, in most of the cases, to models with a lower (than the truth) number of LDO 
classes. Based on these findings, we will consider the model [m = 5, (7 = 2] as the potential 
competitor for the MAR version (the IqHMM). 

Table reports the estimated parameters for the longitudinal data model under the IqHMM 
and the lqHMM-|-QLDO specifications, with corresponding 95% confidence intervals (within 
brackets). The confidence intervals have been computed (using a block non-parametric boot¬ 
strap) with B = 1000 resamples. As it can be easily noticed, age and drugs play no role 
in explaining the evolution of the GD4 cell counts over time. For both models, and for all 
the analysed quantiles, more severe depression symptoms lead to a decrease in the response 
variable; as expected, increases in the time since seroconversion correspond to a reduction in 
the level of T-lymphocytes. The effect of Timesero is slightly reduced under the IqHMM with 
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Table 2: Model selection; penalized likelihood criteria for different value of m and G at different 
quantiles. 




LDO classes 


Hidden States 

1 

2 

3 

4 


T 

= 0.25 




AIC 



2 

3247.36 

3215.99 

3218.02 

3231.74 

3 

2895.26 

2876.79 

2870.91 

2890.06 

4 

2655.24 

2642.60 

2646.09 

2656.89 

5 

2550.21 

2550.92 

2556.75 

2589.15 



BIC 



2 

3298.20 

3278.56 

3292.32 

317.77 

3 

2969.56 

2978.47 

2999.97 

3046.49 

4 

2760.83 

2799.03 

2853.36 

2915.01 

5 

2694.91 

2777.74 

2865.71 

2980.23 


T 

= 0.50 




AIC 



2 

2688.11 

2664.24 

2665.12 

2672.56 

3 

2448.49 

2432.94 

2436.74 

2450.87 

4 

2310.55 

2305.78 

2308.59 

2337.79 

5 

2239.02 

2242.75 

2255.94 

2282.33 



BIC 



2 

2738.95 

2726.81 

2739.42 

2758.59 

3 

2522.79 

2534.62 

2565.80 

2607.30 

4 

2416.15 

2462.22 

2515.86 

2595.90 

5 

2383.72 

2469.57 

2564.90 

2673.41 


respect to its MNAR counterpart. Results for the remaining covariates follow. Based on the 
results reported in table smoking more cigarettes (for r = 0.25 and r = 0.50, with stronger 
effect in the former case) and having more sexual partners (r = 0.25 only) are associated to 
higher CD4 cell counts. According to Zeger and Diggle (1994), the positive effect of such risk 
factors may be due to immune response stimulation or, simply, to a form of selection bias 
with healthier men staying longer in the study that continue their usual practices. Regarding 
state-dependent intercepts, the estimates increase with the quantile level and, in all the anal¬ 
ysed models, higher CD4 cell counts correspond to “higher” hidden states. When comparing 
results obtained under the IqHMM and the IqHMM-hQLDO, no substantial differences can be 
observed; this suggests the class of models we are considering is rather robust with respect to 
possible misspecihcation of the missing data generating mechanism. However, when looking at 
the bootstrap conhdence intervals, slight differences emerge. That is, if we consider the missing 
data process, we obtain narrower intervals and, therefore, improved reliability for parameter 
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Table 3: Estimated parameters for the longitudinal data model at different quantiles. 




IqHMM 


IqHMM-kqLDO 

T = 0.25 

ai 

4.738 

(3.238 ; 4.956) 

4.728 

(3.221 ; 4.944) 

012 

5.699 

(5.395 ; 5.750) 

5.693 

(5.435 ; 5.752) 

03 

6.126 

(6.051 ; 6.164) 

6.118 

(6.073 ; 6.155) 

014 

6.509 

(6.413 ; 6.562) 

6.500 

(6.446 ; 6.549) 

as 

6.843 

(6.757 ; 6.935) 

6.832 

(6.772 ; 6.922) 

Age 

0.001 

(-0.006 ; 0.005) 

0.001 

(-0.006 ; 0.005) 

Drugs 

-0.033 

(-0.084 ; 0.068) 

-0.025 

(-0.074 ; 0.062) 

Packs 

0.082 

(0.051 ; 0.096) 

0.082 

(0.048 ; 0.095) 

Partners 

0.011 

(0.002 ; 0.018) 

0.010 

(0.000 ; 0.017) 

CESD 

-0.004 

(-0.006 ; -0.001) 

-0.004 

(-0.006 ; -0.001) 

Timesero 

-0.091 

(-0.121 ; -0.075) 

-0.089 

(-0.121 ; -0.073) 

T = 0.50 

ai 

5.628 

(5.074 ; 5.753) 

5.618 

(5.142 ; 5.751) 

0.2 

6.198 

(6.014 ; 6.252) 

6.197 

(6.060 ; 6.233) 

03 

6.524 

(6.393 ; 6.574) 

6.522 

(6.450 ; 6.558) 

014 

6.805 

(6.719 ; 6.874) 

6.797 

(6.753 ; 6.854) 

Os 

7.191 

(7.084 ; 7.291) 

7.182 

(7.112 ; 7.271) 

Age 

-0.003 

(-0.007 ; 0.005) 

-0.003 

(-0.007 ; 0.005) 

Drugs 

0.036 

(-0.016 ; 0.110) 

0.038 

(-0.007 ; 0.082) 

Packs 

0.049 

(0.014 ; 0.068) 

0.048 

(0.011 ; 0.067) 

Partners 

0.002 

(-0.003 ; 0.012) 

0.001 

(-0.004 ; 0.011) 

CESD 

-0.005 

(-0.007 ; -0.001) 

-0.005 

(-0.007 ; -0.001) 

TimCsero 

-0.110 

(-0.126 ; -0.084) 

-0.108 

(-0.125 ; -0.080) 


estimates. By matching the results discussed so far with the estimated initial and transition 
probabilities, more thoughtful information on individual trajectories can be obtained. We re¬ 
port in table 1^ the parameters for the Markov chain estimated under the IqHMM formulation. 
For r = 0.25, it is clear that most of patients start the study with a medium/high level of CD4 
cell counts <54 + 55 > 0.9). As the time passes by, the estimated Q matrix highlights a high 
variability in the longitudinal trajectories. Transitions between states are quite likely; units 
being in lower hidden states generally tend to move towards higher baseline values. When 
analysing results we have obtained for the median response (r = 0.50), a different evolution 
of the response variable seems to be recovered. Here, intermediate hidden states are the most 
likely at the beginning of the observation window (52 + 53 + 54 > 0.85) and transitions between 
states are less frequent than that observed for r = 0.25 [qhh > 0.8,V/i = l,...,m). If any 
transition is observed, the probability of moving towards “lower” states is slightly higher than 
that of moving towards the highest ones. 

The analysis of results obtained under the IqHMM+QLDO specification can help understand¬ 
ing the effect of a potentially non-ignorable missingess on these results. In figure we report 
the estimated LDO class probabilities obtained under the IqHMM+QLDO specification. It 
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may be noticed that, for both quantiles, higher classes are associated with increasing time to 
drop-out. That is, units staying longer into the study belongs to the second LDO class. We 

Figure 2: LDO class probabilities for r = 0.25 (left) and r = 0.50 (right). 


x = 0.25 



T = 0.50 


1.00 



Time Occasion 
- 1-2 


report in tables [9 10 the estimates for the initial and the transition probabilities under the 
lqHMM-|-QLDO specification for the two classes (say LDOi and LDO2). Initial probability 
estimates, for all the analysed quantiles, suggest that the first hidden state is quite unlikely 
at the beginning of the study. Units are almost equally distributed over the remaining states. 
As regards the transition probability matrices, parameter estimates highlight the presence of 
individuals in the sample who experience quite a different progression of the disease over time. 
Class LDOi is characterized by shorter individual sequences and mostly include subjects who 
leave the study prematurely. Within this class, the estimated transitions for r = 0.25 are 
quite similar to those observed for the IqHMM specification. Units with particularly low CD4 
count levels move towards “higher” hidden states. The only remarkable difference between 
IqHMM and lqHMM-|-QLDO is related to qu that, under the latter approach, is much higher 
(gii = 0.931 vs gii = 0.798). This is probably due to those units in the sample that leave the 
study with very low CD4 levels and that, under the MAR approach, are not clearly identified. 
When we look at the results for r = 0.50, the estimated transitions suggest a progressive reduc¬ 
tion in the median response over time. Comparing results obtained under the MNAR and the 
MAR approach, it is clear that such an evolution is better identified when accounting for the 
missing data process. In fact, under the LDO specification, the probability of moving towards 
the “lowest” state is higher than that observed for IqHMM and with probability equal to one 
individuals do not further move. This result helps detect units that drop-out prematurely from 
the study after experiencing a steep and sudden reduction in CD4 count levels. 

Focusing on class LDO2 (ie the class associated with units staying longer into the study) 
different longitudinal paths can be observed. When considering the left tail of the response 
distribution (r = 0.25), the first two hidden states are seldom visited and, if any transition 
is observed, units move towards “higher” states in at the next occasion. The only exception 
is for the estimate gsi = 0.184 which is probably associated to some units that experience a 
sudden decrease in the CD4 level followed by an increase at the subsequent visit. Regarding 
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Figure 3; Longitudinal trajectories by LDO class, for r = 0.25 (left) and r = 0.50 (right). 



the other hidden states, if any transition is observed, units generally tend to move towards 
higher baseline values. A similar path can be observed for the median response, r = 0.50, 
where the estimated Q matrix is almost diagonal, apart from the first hidden state which is, 
however, seldom reached. As for r = 0.25, also in this case, if any transition is observed, this 
is generally towards higher intercept values. 

To support the results we have discussed so far, we report in hgure the longitudinal tra¬ 
jectories of individuals classified (via a MAP criterion) into LDOi (left) and LDO 2 (right), 
for r = 0.25 and r = 0.50. Local polynomial regression curves (blue lines), 95% confidence 
intervals (gray bands) and mean values (blue dots) are reported. Due to the missing data pro¬ 
cess, wider confidence intervals are observed at the last measurement occasions. As expected, 
units in class LDOi leave the study earlier in time and experience a more evident reduction in 
the CD4 counts during the follow-up time. On the other hand, longer longitudinal sequences 
and more stable response patterns are observed for those units who are classihed in LDO 2 , 
for both T = 0.25 and r = 0.50. While we can not postulate the proposed model is correct 
and the IqHMM is not (this is not our aim indeed), we may observe that, by considering an 
inhomogeneous hidden Markov representation due to a non random missing data generating 
process, some of the parameter estimates slightly change interpretation and we get a more 
complete and coherent picture of the response variable dynamics. 

6 Simulation study 

To evaluate the empirical behaviour of the proposed model, we have performed the following 
simulation study. Data have been generated from a Gausian HMM-I-QLDO with m = A states 
and G = 2 LDO classes. For the missing data model, we have considered the following set 
of model parameters: A = (4.41,-0.63). Based on such values, “higher” LDO classes are 
associated to longer longitudinal sequences. Initial probabilities for the hidden Markov chain 
have been fixed to 5 = (0.05, 0.39, 0.48, 0.08), while transition probabilities have been set equal 
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to 


Q(i) 


1.00 

0.00 

0.00 

0.00 

0.27 

0.73 

0.00 

0.00 

0.00 

0.23 

0.71 

0.06 

0.05 

0.06 

0.00 

0.89 


Q(2) 


0.91 

0.09 

0.00 

0.00 

0.05 

0.92 

0.03 

0.00 

0.02 

0.03 

0.94 

0.01 

0.00 

0.00 

0.01 

0.99 


Based on these parameter values, individuals belonging to the first LDO class move towards 
“lower” hidden states with a higher probability than units belonging to the second class. Here, 
we have decided to reduce the distance between the transition probability matrices associated 
to the LDO classes when compared to those estimated for the real data application. This has 
been done to verify the ability of the estimation algorithm in recovering the “true” latent struc¬ 
ture. As regards the longitudinal observations, covariates available for the CD4 dataset have 
been directly considered. The following set of fixed parameters has been considered: /3timeSero = 
-0.088, /?age = 0.006, /3drugs = 0.148, ^packs = 0.055, /^partners = 0.009, /3cesd = “0.004; on the 
other hand, state-specific random intercepts have been set to a = {5.861,6.306,6.650,7.039}. 
Based on these parameters, we have simulated the response variable from a Gaussian distri¬ 
bution, with variance = 0.23, corresponding to the variance for the ALD density estimated 
in the real data application for r = 0.50. Mean values have been defined according to the 
following model 

Mit(sit) = a{sit) + x'j/3. 


We have considered B = 200 samples and estimated a lqHMM-|-QLDO for different quantiles, 
r = (0.25, 0.50}, and for different choices of m and G, m = {3,4, 5} and G = {1, 2, 3}. 

The bias and the standard deviation of parameter estimates for the longitudinal data model 
estimated for fixed m = 4 and G = 2 are reported in table As it is expected, a higher bias is 
observed for the parameters related to the hidden Markov chain when compared to the fixed 
effect estimates. The quality of results reduces (that is bias and sd tend to increase) when 
considering the left tail of the response distribution as this represents a low density region with 
reduced information. 

We report in tables [5]|^ the bias and the standard deviation (within brackets) of the estimated 
transition probability matrices for the LDO classes considering r = 0.25 and r = 0.50, respec¬ 
tively. For both quantiles, parameters are estimated with good accuracy in term of bias and 
(relatively) low variability, whatever the LDO class and the hidden state. 

Last, in table we show the distribution of the estimated number of hidden states and 
LDO classes, using the AIC and the BIG criteria. As it is clear, AIC outperforms BIG in 
recovering the true number of states and classes. In fact, BIG tends to heavily penalize highly 
parametrized models. In the present context, for both quantiles, the BIG index suggests to 
adopt a IqHMM, that is a lqHMM-|-QLDO with a single LDO class (ie G = 1). On the contrary, 
AIG seems to recover with high accuracy the real model structure and it should be considered 
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Table 4: Bias and standard deviation of longitudinal model parameters for the IqHMM+QLDO 
with m = 4 and G = 2. r = {0.25,0.50} 



T = 

0.25 

r = 

0.50 

Bias 

Sd 

Bias 

Sd 

ai 

0.0099 

0.0027 

0.0102 

0.0013 

02 

0.0150 

0.0008 

0.0134 

0.0034 

03 

0.0222 

0.0018 

0.0109 

0.0018 

04 

0.0230 

0.0204 

0.0050 

0.0075 

/^timeSero 

-0.0017 

0.0026 

0.0011 

0.0009 

/3age 

-0.0004 

0.0004 

-0.0002 

0.0001 

/^drugs 

-0.0073 

0.0027 

-0.0118 

0.0031 

/3packs 

0.0005 

0.0010 

0.0002 

0.0012 

/^partners 

-0.0006 

0.0007 

0.0001 

0.0003 

Pcesd 

0.0000 

0.0001 

0.0000 

0.0001 


as a better choice to estimate m and G. Surprisingly, when comparing r = 0.25 and r = 0.50, 
slightly better results with respect to the choice of [m, G] are obtained in the former case. AIC 
always identihes the right model for r = 0.25, while some anomalies have been observed for 
r = 0.50, where, in 11% of samples, a further hidden state is selected. This is probably due to 
a more extreme behaviour in terms of state-specific locations which can be seldom observed at 
r = 0.25. 

To summarize, results we have obtained highlight the effectiveness of the estimation algorithm 
in recovering the “true”, underlying, model structure. The quality of parameter estimates we 
have obtained in this simulation study suggests that the results presented in Section for the 
CD4 data analysis may be considered as quite reliable. The proposed model can be seen as 
a valid and flexible approach to handle informative missing data patterns while controlling 
for time-varying sources of unobserved heterogeneity in longitudinal profiles. While the choice 
of letting Q vary with the LDO class may lead to a substantial increase in the number of 
parameters, it may help describe the changes in the behaviour of units with a (possibly) 
different propensity to drop-out from the study. 

7 Conclusions 

Quantile regression models represent an interesting alternative to standard mean regression 
when the researcher’s interest is on the tails of the response variable distribution and/or poten¬ 
tial outliers in the data may affect the mean values. When responses are repeatedly measured 
over time on the same sample units, dependence between observations has to be taken into 
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Table 5: Simulation results. Bias and standard deviation (within brackets) of transition prob¬ 
ability matrices for the lqHMM-|-QLDO with m = 4 and G = 2. r = 0.25 



1 

r 

) 


» 

L 

1 

LDOi 









1 

-0.002 

(0.00) 

0.002 

(0.00) 

0.000 

(0.00) 

0.000 

(0.00) 

2 

-0.041 

(0.01) 

0.041 

(0.01) 

0.000 

(0.00) 

0.000 

(0.00) 

3 

0.000 

(0.00) 

-0.074 

(0.03) 

0.062 

(0.06) 

0.011 

(0.03) 

4 

0.002 

(0.02) 

-0.032 

(0.02) 

0.000 

(0.00) 

0.030 

(0.03) 

LDO2 









1 

0.017 

(0.00) 

-0.017 

(0.00) 

0.000 

(0.00) 

0.000 

(0.00) 

2 

0.012 

(0.02) 

-0.009 

(0.02) 

-0.003 

(0.00) 

0.000 

(0.00) 

3 

0.007 

(0.01) 

-0.006 

(0.01) 

0.004 

(0.01) 

-0.005 

(0.00) 

4 

0.005 

(0.00) 

0.024 

(0.01) 

-0.004 

(0.00) 

-0.025 

(0.02) 


consideration to ensure the validity of inferential conclusions. In the presence of a potentially 
informative missing data mechanism, standard statistical tools may lead to biased parameter 
estimates due to the “selection” of units remaining under observation. In this paper, we have 
proposed a linear quantile hidden Markov model with drop-out dependent transitions. Within 
this framework, we obtain a more detailed picture of the response variable distribution and, 
jointly, address the problem of potentially non-ignorable missingness. More in detail, the latent 
drop-out class variable allows to capture (time-invariant) unobserved sources of heterogeneity 
shared by individuals with a similar propensity to drop-out. Such propensities lead to dif¬ 
ferent transitions across the states of the hidden Markov chain; the marginal model for the 
longitudinal response is, therefore, given by a hnite mixture of IqHMMs. 

We have re-analysed a benchmark dataset and compared the results obtained under the stan¬ 


dard IqHMM by Farcomeni (2012) with those from the proposed approach. Although with 
the proposed approach the number of parameters consistently increases, a clearer description 
of the observed data is obtained; this renders the proposed methodology an interesting and 
valuable alternative to existing modelling approaches. 
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Table 6: Bias and standard deviation (within brackets) of transition probability matrices for 
the IqHMM+QLDO with m = 4 and G = 2. r = 0.50 



1 

r 

A 

) 

e- 

) 

z 

1 

LDOi 









1 

-0.003 

(0.00) 

0.003 

(0.00) 

0.000 

(0.00) 

0.000 

(0.00) 

2 

-0.042 

(0.02) 

0.042 

(0.02) 

0.000 

(0.00) 

0.000 

(0.00) 

3 

0.007 

(0.00) 

-0.054 

(0.02) 

0.032 

(0.03) 

0.014 

(0.01) 

4 

-0.004 

(0.01) 

-0.034 

(0.01) 

0.000 

(0.00) 

0.038 

(0.03) 

LDO2 









1 

0.027 

(0.01) 

-0.027 

(0.01) 

0.000 

(0.00) 

0.000 

(0.00) 

2 

0.015 

(0.01) 

-0.008 

(0.01) 

-0.007 

(0.00) 

0.000 

(0.00) 

3 

0.004 

(0.00) 

-0.002 

(0.01) 

-0.001 

(0.01) 

-0.001 

(0.00) 

4 

0.007 

(0.00) 

0.026 

(0.01) 

-0.004 

(0.00) 

-0.029 

(0.01) 


Table 7: Values of m and G estimated with BIC and AIC. r = {0.25,0.50}. 





BIC 



AIC 




G = 1 

0 

II 

to 

0 

II 

CO 

G = 1 

II 

to 

II 

CO 

m = 

3 

0.00 

0.00 

r = 0.25 

0.00 

0.00 

0.00 

0.00 

m = 

4 

0.99 

0.01 

0.00 

0.00 

1.00 

0.00 

m = 

5 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

m = 

3 

0.00 

0.00 

r = 0.50 

0.00 

0.00 

0.00 

0.00 

m = 

4 

0.97 

0.03 

0.00 

0.00 

0.89 

0.00 

m = 

5 

0.00 

0.00 

0.00 

0.00 

0.11 

0.00 
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Table 8: Estimated initial and transition probabilities at different quantiles for the IqHMM, m = 5. 




1 



2 



3 



4 



5 


T = 

0.25 















<5 

0.002 

(0.000 

0.009) 

0.033 

(0.000 

0.070) 

0.333 

(0.231 

0.431) 

0.426 

(0.342 

0.529) 

0.206 

(0.100 

0.300) 

1 

0.798 

(0.374 

1.000) 

0.040 

(0.000 

0.273) 

0.129 

(0.000 

0.501) 

0.000 

(0.000 

0.464) 

0.033 

(0.000 

; 0.184) 

2 

0.137 

(0.067 

0.208) 

0.660 

(0.436 

0.778) 

0.203 

(0.090 

0.429) 

0.000 

(0.000 

0.029) 

0.000 

(0.000 

; 0.020) 

3 

0.004 

(0.000 

0.028) 

0.137 

(0.080 

0.195) 

0.689 

(0.568 

0.787) 

0.155 

(0.093 

0.250) 

0.015 

(0.000 

; 0.046) 

4 

0.009 

(0.000 

0.017) 

0.035 

(0.000 

0.070) 

0.158 

(0.100 

0.232) 

0.744 

(0.656 

0.808) 

0.055 

(0.021 

; 0.109) 

5 

0.000 

(0.000 

0.005) 

0.008 

(0.000 

0.026) 

0.045 

(0.002 

0.087) 

0.050 

(0.000 

0.109) 

0.896 

(0.839 

; 0.955) 

T = 

0.50 















<5 

0.000 

(0.000 

0.000) 

0.219 

(0.060 

0.310) 

0.360 

(0.238 

0.499) 

0.326 

(0.202 

0.441) 

0.095 

(0.042 

0.149) 

1 

0.933 

(0.802 

1.000) 

0.067 

(0.000 

0.198) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

; 0.000) 

2 

0.068 

(0.031 

0.126) 

0.847 

(0.742 

0.920) 

0.085 

(0.004 

0.179) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

; 0.000) 

3 

0.026 

(0.000 

0.066) 

0.086 

(0.030 

0.163) 

0.827 

(0.718 

0.902) 

0.061 

(0.002 

0.135) 

0.000 

(0.000 

; 0.002) 

4 

0.002 

(0.000 

0.018) 

0.065 

(0.011 

0.106) 

0.032 

(0.000 

0.105) 

0.861 

(0.805 

0.910) 

0.040 

(0.012 

; 0.072) 

5 

0.003 

(0.000 

0.017) 

0.027 

(0.000 

0.070) 

0.000 

(0.000 

0.045) 

0.043 

(0.000 

0.115) 

0.927 

(0.857 

; 0.983) 




Table 9: Estimated initial and transition probabilities for the IqHMM+QLDO, m = 5, G = 2 and LDOi. 




1 


2 


3 


4 


5 

T = 

(5 

0.25 

0.006 

(0.000 ; 0.019) 

0.197 

(0.153 ; 0.232) 

0.259 

(0.228 ; 0.292) 

0.269 

(0.243 ; 0.300) 

0.269 

(0.224 ; 0.322) 

1 

0.931 

(0.715 ; 1.000) 

0.069 

(0.000 ; 0.241) 

0.000 

(0.000 ; 0.000) 

0.000 

(0.000 ; 0.000) 

0.000 

(0.000 ; 0.102) 

2 

0.088 

(0.049 ; 0.132) 

0.663 

(0.525 ; 0.767) 

0.239 

(0.124 ; 0.384) 

0.011 

(0.000 ; 0.053) 

0.000 

(0.000 ; 0.000) 

3 

0.015 

(0.000 ; 0.039) 

0.144 

(0.089 ; 0.229) 

0.704 

(0.546 ; 0.782) 

0.137 

(0.072 ; 0.260) 

0.000 

(0.000 ; 0.016) 

4 

0.000 

(0.000 ; 0.020) 

0.080 

(0.011 ; 0.153) 

0.087 

(0.011 ; 0.250) 

0.772 

(0.576 ; 0.860) 

0.062 

(0.001 ; 0.151) 

5 

0.005 

(0.000 0.015) 

0.021 

(0.000 ; 0.089) 

0.123 

(0.004 ; 0.213) 

0.042 

(0.000 ; 0.159) 

0.809 

(0.681 ; 0.907) 

r = 

(5 

0.50 

0.000 

(0.000 ; 0.004) 

0.200 

(0.073 ; 0.299) 

0.332 

(0.200 ; 0.479) 

0.363 

(0.229 ; 0.449) 

0.104 

(0.062 ; 0.153) 

1 

1.000 

(0.917 ; 1.000) 

0.000 

(0.000 ; 0.083) 

0.000 

(0.000 ; 0.000) 

0.000 

(0.000 ; 0.000) 

0.000 

(0.000 ; 0.000) 

2 

0.138 

(0.056 ; 0.222) 

0.793 

(0.661 ; 0.898) 

0.069 

(0.000 ; 0.215) 

0.000 

(0.000 ; 0.000) 

0.000 

(0.000 ; 0.018) 

3 

0.036 

(0.000 ; 0.118) 

0.240 

(0.088 ; 0.404) 

0.724 

(0.183 ; 0.846) 

0.000 

(0.000 ; 0.461) 

0.000 

(0.000 ; 0.000) 

4 

0.000 

(0.000 ; 0.022) 

0.116 

(0.001 ; 0.275) 

0.123 

(0.000 ; 0.302) 

0.721 

(0.551 ; 0.826) 

0.040 

(0.000 ; 0.114) 

5 

0.010 

(0.000 ; 0.039) 

0.107 

(0.000 ; 0.223) 

0.057 

(0.000 ; 0.277) 

0.000 

(0.000 ; 0.364) 

0.826 

(0.527 ; 0.955) 




Table 10: Estimated initial and transition probabilities for the IqHMM+QLDO, m = 5, G = 2 and LD02- 


1 


4 


r = 0.25 


<5 

0.006 

(0.000 ; 

0.019) 

0.197 

(0.153 

0.232) 

0.259 

(0.228 

0.292) 

0.269 

(0.243 

0.300) 

0.269 

(0.224 

0.322) 

1 

0.000 

(0.000 ; 

0.000) 

0.000 

(0.000 

0.000) 

1.000 

(1.000 

1.000) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

0.000) 

2 

0.000 

(0.000 ; 

0.000) 

0.000 

(0.000 

0.160) 

0.000 

(0.000 

0.047) 

0.726 

(0.000 

1.000) 

0.274 

(0.000 

1.000) 

3 

0.184 

(0.000 ; 

0.994) 

0.000 

(0.000 

0.000) 

0.816 

(0.000 

1.000) 

0.000 

(0.000 

0.124) 

0.000 

(0.000 

0.000) 

4 

0.007 

(0.000 ; 

0.052) 

0.064 

(0.000 

0.177) 

0.046 

(0.000 

0.197) 

0.763 

(0.035 

0.906) 

0.121 

(0.003 

0.754) 

5 

0.000 

n 'in 

(0.000 

0.000) 

0.006 

(0.000 

0.168) 

0.000 

(0.000 

0.024) 

0.005 

(0.000 

0.150) 

0.989 

(0.765 

1.000) 

7 ” 

<5 

u.ou 

0.000 

(0.000 ; 

0.004) 

0.200 

(0.073 

0.299) 

0.332 

(0.200 

0.479) 

0.363 

(0.229 

0.449) 

0.104 

(0.062 

0.153) 

1 

0.515 

(0.000 ; 

1.000) 

0.485 

(0.079 

1.000) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

0.000) 

0.000 

(0.000 

0.000) 

2 

0.000 

(0.000 ; 

0.062) 

0.919 

(0.438 

1.000) 

0.081 

(0.000 

0.721) 

0.000 

(0.000 

0.032) 

0.000 

(0.000 

0.000) 

3 

0.018 

(0.000 ; 

0.058) 

0.011 

(0.000 

0.111) 

0.900 

(0.763 

0.975) 

0.071 

(0.003 

0.248) 

0.000 

(0.000 

0.000) 

4 

0.000 

(0.000 ; 

0.020) 

0.020 

(0.000 

0.055) 

0.021 

(0.000 

0.089) 

0.919 

(0.862 

0.968) 

0.040 

(0.000 

0.087) 

5 

0.000 

(0.000 ; 

0.021) 

0.000 

(0.000 

0.028) 

0.000 

(0.000 

0.000) 

0.039 

(0.000 

0.130) 

0.961 

(0.889 

1.000) 
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